Ultra-Fast Matrix Multiplication: An Empirical Analysis of Highly Optimized Vector Algorithms

نویسنده

  • Boyko Kakaradov
چکیده

33 Matrices play an important role in mathematics and computer science, but more importantly, they are ubiquitous in our daily lives as they are instrumental in the efficient manipulation and storage of digital data. Matrix multiplication is essential not only in graph theory but also in applied fields, such as computer graphics and digital signal processing (DSP). DSP chips are found in all cell phones and digital cameras, as matrix operations are the processes by which DSP chips are able to digitize sounds or images so that they can be stored or transmitted electronically. Fast matrix multiplication is still an open problem, but implementation of existing algorithms [5] is a more common area of development than the design of new algorithms [6]. Strassen’s algorithm is an improvement over the naive algorithm in the case of multiplying two 2×2 matrices, because it uses only seven scalar multiplications as opposed to the usual eight. Even though it has been shown that Strassen’s algorithm is optimal for two-by-two matrices [6], there have been asymptotic improvements to the algorithm for very large matrices. Thus, the search for improvements over Strassen’s algorithm for smaller matrices is still being conducted. Even Strassen’s algorithm is not considered an efficient reduction as it requires the size of the multiplicand matrices to be large powers of two. The following two sections present the naive, Winograd’s, and Strassen’s algorithms along with discussions of the theoretical bounds for each algorithm. We then present a confirmation of the theoretical study on the running times of the algorithms, followed by the results of an empirical study. In the final two sections, we present an improved Hybrid algorithm, which incorporates Strassen’s asymptotical advantage with Winograd’s practical performance, and discuss the stunning performance of the AltiVecoptimized Strassen’s algorithm. Naive Algorithm

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure

The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...

متن کامل

Ultra-Fast Matrix Multiplication: An Empirical Analysis of Highly Optimized Vector Algorithms

33 Matrices play an important role in mathematics and computer science, but more importantly, they are ubiquitous in our daily lives as they are instrumental in the efficient manipulation and storage of digital data. Matrix multiplication is essential not only in graph theory but also in applied fields, such as computer graphics and digital signal processing (DSP). DSP chips are found in all ce...

متن کامل

A New Method for Forecasting Uniaxial Compressive Strength of Weak Rocks

The uniaxial compressive strength of weak rocks (UCSWR) is among the essential parameters involved for the design of underground excavations, surface and underground mines, foundations in/on rock masses, and oil wells as an input factor of some analytical and empirical methods such as RMR and RMI. The direct standard approaches are difficult, expensive, and time-consuming, especially with highl...

متن کامل

A Superfast Algorithm for Confluent Rational Tangential Interpolation Problem via Matrix-vector Multiplication for Confluent Cauchy-like Matrices∗

Various problems in pure and applied mathematics and engineering can be reformulated as linear algebra problems involving dense structured matrices. The structure of these dense matrices is understood in the sense that their n2 entries can be completeley described by a smaller number O(n) of parameters. Manipulating directly on these parameters allows us to design efficient fast algorithms. One...

متن کامل

Decoding Generalized Reed-Solomon Codes and Its Application to RLCE Encryption Scheme

This paper compares the efficiency of various algorithms for implementing public key encryption scheme RLCE on 64-bit CPUs. By optimizing various algorithms for polynomial and matrix operations over finite fields, we obtained several interesting (or even surprising) results. For example, it is well known (e.g., Moenck 1976 [13]) that Karatsuba’s algorithm outperforms classical polynomial multip...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004